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ABSTRACT 


“1. Introduction 


This study proposes a novel neural method to enhance translation prediction by incorporating Lexical context 
representation. The proposed model support encodes the source language but also captures functional similarities, 


resulting in improved translation accuracy [1]. 


The model is integrated into phrase-based translation (PBT) and hierarchical phrase-based translation (HPBT) 
models. Extensive experiments conducted on large-scale Chinese-to-English and English-to-German translation 
tasks demonstrate significant improvements [2-4]. The research findings indicate that integrating the DBiCS-based 
neural network model (DNNJM) into the decoding process of PBT and HPBT greatly enhances the performance of 
statistical machine translation (SMT) [5]. 


The appraisal metric used is the Case Unaffected BLEU-4 score, which surpasses the performance of all baseline 
systems, providing superior translation quality [6]. DNNJM enthusiastically epitomizes context with various 
conversion time steps, leveraging structural clues for improved translation. It is worth noting that although NMT 


(Neural Machine Translation) still outperforms the proposed model with a significantly higher BLEU score. 


“= 2, Experimental Setup 


The planned system is a Transformer constructed NMT with Self Commitment Technique which is added at both 
the Encrypt and Translator layer for translating the input language sentence (German) to the target language 
(English) [7]. The objective is to develop an end to end algorithm for Language Translation using NMT [8]. From 
the dataset 50K sentence pairs are taken and split in the proportion of 80:20 for the preparation and assessment 


purpose respectively. 
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Positional Encoding 


An Absolute Positional Encoding PE] vector is added to each embedding source and target word to denote the 
position j in Transformer Architecture. It ensures unique output encoding for each time step, Generalize longer 


sentences and Deterministic in nature [9]. 
FSR for rare words 
INPUT : Cleaned Sentence (Not in Vocabulary) 


OUTPUT : Embedding vectors 


Clean Sentence Constructing rare word Generating FSRs for Word 


Embedding Tree Rare words Embedding 
Figure 1. Fuzzy Semantic Representation 
Embedding Tree Construction Algorithm 
Begin 
1. Train toolkit Word2Vec is used for Monolingual Data 
2. Words OOV but appears >3 times are Clustered into M classes by k-means 
3. Class Embedding = Mean(Word Embeddings in the class) 
4. Each class has <UNKj> containing remaining words that belong to this class 
(Cosine Similarity) 
5. <UNKj> embedding = Mean(Remaining words in class) 
End 
FSR Generation Algorithm [10] 
Begin 


1. To address data sparseness, an input vector is constructed, this approach helps to incorporate contextual 


information and overcome the limitations posed by scarce data. 


2. The system distinguishes between class information and word information for rare words. This differentiation 
allows for a more nuanced representation of rare words, considering both their inherent properties as individual 


words and their classification within a broader context. 


3. The system computes separate input vectors for both the source and target rare words. This allows for the 
incorporation of contextual information specific to the input verbal and the output verbal, enabling more accurate 


and meaningful translations of rare words. 


»j=12 (1) 


Ex... = U, owe, 
J 
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EY wx — U, Doe, 
J 


»j=12 (2) 


e The embedding of word nodes in the path, denoted as Ep2, and the embedding of class nodes in the path, denoted 


as Ep], are utilized in the process. Additionally, random initialized weight matrices, Ux and Uy, are employed [12]. 


e The calculation or transformation of the input vectors or the overall modeling process. However, without further 
context or information about the specific algorithm or methodology being described, it is challenging to provide a 


more detailed explanation [13]. 
End 


Latent Topic Representation [11] 


Input layer ewan gS ' ‘ 
Pi Myke wack of Convolution layer Max-pooling layer Output layer 


L Pay 


Tr = tanh(Prn) 


Figure 2. CNN model to represent the topic as LTR 

LTR Algorithm 

//Extracting Key information from the Text sentence for better Translation [14],[15] 
INPUT: 

{T1, T2, T3,..., Tm} is M Topics 

X={ xl, x2, x3 ,..., xj } is input sentence (Word embedded) 

OUTPUT: 

Topic distribution of a sentence - topic context vector 

Begin 

1. J*D input matrix layer is passed to 2M Convolution Filters for 3 consecutive rows 
where D is the dimension of the word vector and J is the sentence Length 

2. The convolution Layer performs M row-wise max operations to generate Max Pooling Layer 
3. Max Pooling Layer undergoes Tanh function to generate Output Layer 

4. Finally T is mapped to {Key, Value} pair 


End 
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3. Module Outputs 


The results obtained from the Neural Machine Translation for German to English Conversion is discussed in this 
chapter. The preprocessing, word tokenizer and the translation sentences are added as the snapshots. The Google 
Translate module is integrated for the checking correct translation of the German sentences to its corresponding 
English sentences. The Model is trained by limiting the corpus to 50000 sentence examples. The <start> and <end> 
tag is attached at the end of every sentence so that the Encoder and Decoder knows its starting point and its finishing 


point respectively. Special characters from the sentences are removed using RE and the data gets cleaned. 


Sample Datas from Corpus 


en, de = create dataset(path to file, 
print(en[0]) 


en[500}) 
de[500}) 


<start> go . <end> 
<start> geh . <end> 


<start> be kind . <end> 
<start> sei nett ! <end> 


<start> let s go ! <end> 
<start> lass uns gehen ! <end> 


Figure 3. After Preprocessing printing sample sentences of Parallel Corpus 
Word Tokenizer 


Each and every word in a sentence is mapped to a number defined in the Dictionary or vocabulary. The figure 5 
shows the mapping of sample German and English sentences which is already preprocessed with its corresponding 
numbers tagged [5]. Using the one hot encoding in the Word2Vec model the words in the sentence are converted to 
its corresponding vectors. The tagging of words with indexes is helpful in maintaining the position of words in the 
sentence both at the foundation as fit as the target side. The Positional Encoding takes care of the position of the 


words in the sentence using the Trigonometric functions like Cosoidal and Sinusoidal waves. 


2 ----> <end> 


wage; index to word mapping 


Input Language; index to word mapping 
1 ----> <start> 


Figure 4. Tagging pre-processed sentences with index 
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Hyper-parameter Optimization 


The hyper-parameters are used for tuning the model such as stacking the encoder and decoder layers for accurate 
translation thus resulting in better accuracy and maximizing BLEU score thereby minimizing TER score. The 
parameters which affect the system are buffer_size, batch_size, embedding_dimensions, units, vocab_inp_size and 


vocab_tar_size. 


BUFFER_SIZE = len(input_tensor_train) 
2 BATCH SIZE = 64 
3 steps per epoch = len(input_tensor_train)//BATCH SIZE 
4 embedding dim = 128 
5 units = 1024 


6 vocab inp size = len(inp lang.word_index)+1 


] vocab tar_size = len(targ_lang.word_index)+1 


Figure 5. Hyper-parameters for Transformer Architecture 


Sentence Translation 


Thus Transformer based NMT is used for translating German to English sentences where the results show its a close 
approximation to the Google Translator. The Attention Graph is plotted for the translation where the sentence 
context is used to get better results with the support of Attention weights. The brighter color in the graph is the 


closest translation to its corresponding word of a sentence. 
Test cases 


A check case is a file which has a hard and fast of situations or movements that are finished at the proposed gadget 
software with a purpose to affirm the predicted capability of the feature. The Sentences which are translated 


correctly and with close approximation based on the word of the sentence and the rare words are handled. 
Inference 


From the sample test cases below the actual output obtained by the Google translator API for the German sentences 
is taken and compared with the predicted output given by our system. The graph plotted is based on the attention 
weight of the German sentence with respect to its English sentence. The brightest region in the attention graph tells 
the correct translation. In most of the cases there is a close relationship with the Google Translator output. Due to 
the sentence level context, the translated output shows similar meaningful words with the actual output. Rare words 
in the sentence are translated based on context meaning based on attention weight mechanism with Key, Query and 
Values used in the transformer model. The Encoder-Decoder model takes all the context as LTR and the FSR for 
OOV rare words is taken into consideration for a better translation output for German to English sentences. 9/10 test 


cases passed by the translator system. 
Training Time 


The total time taken by the batches per epoch is tabulated below. Each epoch has 6 batches of size 100 to be trained 
and the sum of all the time of batches per epoch is calculated. The GPU with RAM of 25 GB is used to run the 
model and the time taken per epoch is given in seconds (sec). The training time for every 5 epochs increases by a 


slight margin. 
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Table 1. Training time for epochs 


SY 


janie 


11-15 193.3618 
21-25 194.2428 


= 4, Conclusion 


In this Research, we develop an end-to-end Language Translation structure that utilizes a Transformer-based 
(NMT) to convert German sentences into English sentences. The system takes the context of a sentence into 
consideration to ensure that the translation internments the same sense as the basis verdict. The Transformer's 
Self-Courtesy contrivance is employed to increase the accuracy of the translated verdicts. To handle rare words that 
are out-of-vocabulary (OOV) in the sentence, we integrate Frequency Sensitive Replacers (FSR) with the 
Transformer model. Additionally, we incorporate a (CNN) within the system to leverage the context of the sentence 
as the Language Translation Resource (LTR). The process of modeling the Sequence-to-Sequence system for the 
dataset is time-consuming. However, the experiments demonstrate that our proposed NMT system achieves 
excellent performance, as evidenced by a higher BLEU score compared to conventional Machine Translation (MT) 


systems. Furthermore, the system significantly reduces the Translation Error Rate (TER). 
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